Analyzing Stemming Approaches for Turkish Multi-Document Summarization
نویسندگان
چکیده
In this study, we analyzed the effects of applying different levels of stemming approaches such as fixed-length word truncation and morphological analysis for multi-document summarization (MDS) on Turkish, which is an agglutinative and morphologically rich language. We constructed a manually annotated MDS data set, and to our best knowledge, reported the first results on Turkish MDS. Our results show that a simple fixed-length word truncation approach performs slightly better than no stemming, whereas applying complex morphological analysis does not improve Turkish MDS.
منابع مشابه
AllSummarizer system at MultiLing 2015: Multilingual single and multi-document summarization
In this paper, we evaluate our automatic text summarization system in multilingual context. We participated in both single document and multi-document summarization tasks of MultiLing 2015 workshop. Our method involves clustering the document sentences into topics using a fuzzy clustering algorithm. Then each sentence is scored according to how well it covers the various topics. This is done us...
متن کاملMultilingual Summarization: Dimensionality Reduction and a Step Towards Optimal Term Coverage
In this paper we present three term weighting approaches for multi-lingual document summarization and give results on the DUC 2002 data as well as on the 2013 Multilingual Wikipedia feature articles data set. We introduce a new intervalbounded nonnegative matrix factorization. We use this new method, latent semantic analysis (LSA), and latent Dirichlet allocation (LDA) to give three term-weight...
متن کاملAnalyzing Pre-processing Settings for Urdu Single-document Extractive Summarization
Preprocessing is a preliminary step in many fields including IR and NLP. The effect of basic preprocessing settings on English for text summarization is well-studied. However, there is no such effort found for the Urdu language (with the best of our knowledge). In this study, we analyze the effect of basic preprocessing settings for single-document text summarization for Urdu, on a benchmark co...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملUltra-stemming and Statistical Summarization at INEX 2013 Tweet Contextualization Track
According to the organizers, the objective of the 2013 INEX Tweet Contextualization Task is: “...The Tweet Contextualization aims at providing automatically information a summary that explains the tweet. This requires combining multiple types of processing from information retrieval to multi-document summarization including entity linking.” We present the Cortex summarizer applied to the INEX 2...
متن کامل